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SUMMARY 

The  nucleotide  sequence  of  the  protective  antigen  (PA)  gene  from  Bacillus  anthracis  and  the  5 '  and  3 '  flanking 
sequences  were  determined.  PA  is  one  of  three  proteins  comprising  anthrax  toxin;  and  its  nucleotide  sequence 
is  the  first  to  be  reported  from  B.  anthracis.  The  open  reading  frame  (ORF)  is  2319  bp  long,  of  which  2205  bp 
encode  the  735  amino  acids  of  the  secreted  protein.  This  region  is  preceded  by  29  codons,  which  appear  to 
encode  a  signal  peptide  having  characteristics  in  common  with  those  of  other  secreted  proteins.  A  consensus 
TATAAT  sequence  was  located  at  the  putative  -10  promoter  site.  A  Shine-Dalgamo  site  similar  to  that  found 
in  genes  of  other  Bacillus  sp.  was  located  7  bp  upstream  from  the  ATG  start  codon.  The  codon  usage  for  the 
PA  gene  reflected  its  high  A  +  T(69%)  base  composition  and  differed  from  those  of  genes  for  bacterial  proteins 
from  most  other  sequences  examined.  The  TAA  translation  stop  codon  was  followed  by  an  inverted  repeat 
forming  a  potential  termination  signal.  In  addition,  a  192-codon  ORF  of  unknown  significance,  theoretically 
encoding  a  21.6-kDa  protein,  preceded  the  5'  end  of  the  PA  gene. 
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INTRODUCTION  products.  It  can  cause  cutaneous  anthrax,  gastro¬ 

intestinal  anthrax,  and  an  often  fatal  systemic  pul- 
B.  anthracis  is  an  important  pathogen  of  animals  monary  form  of  the  disease  (Hambleton  et  al.,  1984; 

and  of  people  exposed  to  infected  animals  or  their  Leppla  etal.,  1985;  Lincoln  and  Fish,  1970).  The 
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two  major  virulence  factors  of  B.  anthracis  are  a 
poly-D-glutamic  acid  capsule  and  ‘anthrax  toxin’. 
DNA  functions  controlling  toxin  and  capsule  pro¬ 
duction  are  carried  on  B.  anthracis  plasmids  pXOl 
and  pX02,  respectively  (Green  et  al.,  1985;  Mikesell 
et  al.,  1983).  The  toxin  is  composed  of  three  separate 
proteins,  PA,  EF  and  LF.  The  three  proteins  alone 
are  nontoxic.  However,  PA  in  combination  with  LF 
causes  death  in  rats  (Beall  et  al.,  1962),  and  PA 
combined  with  EF  produces  edema  in  the  skin  of 
guinea  pigs  and  rabbits  (Leppla  et  al.,  1985;  Lincoln 
and  Fish,  1970).  In  addition  to  mediating  the  toxic 
effects  of  LF  and  EF,  protective  antigen  induces 
immunity  to  infection  and  is  the  major  component  of 
the  currently  licensed  human  vaccine  (Hambleton 
etal.,  1984;  Ivins  and  Welkos,  1986;  Leppla  etal., 
1985;  Little  and  Knudson,  1986;  Turnbull  etal., 
1986). 

To  understand  the  role  of  PA  in  the  pathogenesis 
of  disease  and  the  induction  of  protective  immunity, 
the  DNA  encoding  PA  has  been  cloned  and 
sequenced.  All  three  of  the  toxin  proteins  are  en¬ 
coded  by  the  176-kb  plasmid  pXOl  (Mikesell  et  al., 
1983;  Robertson  and  Leppla,  1986;  Vodkin  and 
Leppla,  1983).  Vodkin  and  Leppla  (1983)  first 
reported  the  cloning  of  the  PA  gene  in  Escherichia 
coli.  The  gene  was  contained  in  a  6-kb  BamW.  frag¬ 
ment  of  pXOl  cloned  into  plasmid  pBR322.  Full- 
size,  biologically  active  PA  was  produced.  The 
Bacillus  promoter  was  present  but  expression  of  the 
gene  by  the  recombinant  plasmid  (pSE36)  in  E.  coli 
was  low. 

In  a  recent  study,  we  subcloned  the  6-kb  insert  of 
pSE36  into  the  plasmid  vector  pUB  1 10  and  trans¬ 
formed  B.  subtilis  with  the  recombinant  DNA  (Ivins 
and  Welkos,  1986).  Two  recombinants  were  isolated 
which  produced  large  amounts  of  full-size  PA  despite 
the  presence  of  deletions  in  the  6-kb  insert  of  approx. 
2.7  kb  and  3.4  kb,  respectively.  In  vitro  concentra¬ 
tions  of  PA  produced  by  the  recombinants  were 
similar  or  greater  than  those  observed  with 
B.  anthracis  (Ivins  and  Welkos,  1986).  Protective 
antigen,  a  protein  of  approx.  85  kDa  by 
SDS-PAGE  (Ivins  and  Welkos,  1986;  Leppla  et  al., 
1985;  Vodkin  and  Leppla,  1983),  requires  a  coding 
region  of  2-2.5  kb. 

The  purpose  of  the  present  study  was  to  map  and 
sequence  the  coding  region  of  PA.  Partial  digestion 
and  religation  of  plasmid  pSE36  (which  has  the  6-kb 


insert)  yielded  a  smaller  derivative  plasmid,  pPA26, 
which  contains  a  4.2-kb  insert  encoding  full-size  PA. 
In  this  report,  the  nucleotide  sequence  of  this  insert 
and  analysis  of  the  PA-coding  region  are  presented. 


MATERIALS  AND  METHODS 

(a)  Bacteria  and  plasmids 

Isolates  of  F.  coA  K- 12  strain  HB 101,  transformed 
with  pSE36  or  pPA26,  were  the  sources  of  plasmid 
DNA;  and  strain  JM103  (Messing,  1983)  was  used 
to  propagate  M13  phage  derivatives. 

(b)  Subcloning  and  detection  of  protective  antigen- 
producing  recombinants 

The  isolation  of  recombinant  E.  coli  [pSE36]  has 
been  described  (Vodkin  and  Leppla,  1983).  Briefly, 
pSE36  consists  of  plasmid  pBR322  with  a  6-kb 
Ram  HI  fragment  encoding  the  PA  protein  from  plas¬ 
mid  pXOl  of  B.  anthracis.  To  obtain  derivatives  hav¬ 
ing  smaller  insert  DNA,  plasmid  pSE36  DNA  was 
partially  digested  with  HindUl  and  religated.  E.  coli 
strain  HBlOl  was  transformed  with  the  plasmid 
DNA,  and  recombinants  were  tested  for  the  presence 
of  the  PA  gene  by  immunological  assay  (Vodkin  and 
Leppla,  1983).  The  size  and  biological  activity  of  PA 
produced  by  the  recombinants  were  tested  by  a 
Western-blot  procedure  and  the  CHO  cell  elongation 
assay,  respectively  (Ivins  and  Welkos,  1986;  Leppla, 
1984;  Vodkin  and  Leppla,  1983). 

(c)  Isolation  of  DNA 

Plasmid  pPA26  DNA  was  prepared  from  cleared 
lysates  by  ultracentrifugation  in  CsCl-EtdBr 
gradients  according  to  methods  described  by 
Maniatis  etal.  (1982).  The  DNA  was  digested 
simultaneously  with  HindWl  +  BamHl,  and  the 
4.2-kb  insert  encoding  PA  was  isolated  as  a  2.2-kb 
^mdlll  and  2.0-kb  Hindlll-BamHl  fragment  (see 
Fig.  1).  The  DNA  fragments  were  purified  by 
preparative  gel  electrophoresis,  the  bands  excised, 
and  the  DNA  extracted  with  phenol  for  cloning  in 
M13. 
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(d)  Nucleotide  sequence  analysis 

The  two  fragments  were  each  cloned  into  phages 
M13mpl0  and  M13mpll,  and  the  dideoxy  chain- 
termination  method  (Messing  and  Seeburg,  1981; 
Sanger  et  al.,  1977)  was  used  to  sequence  the  DNA. 
Initially,  data  were  collected  by  using  the  universal 
primer  (Pharmacia  P-L  Biochemicals,  Piscataway, 
NJ).  Using  these  data,  we  synthesized  oligodeoxy- 
nucleotide  primers  18  nt  long  to  collect  each  addi¬ 
tional  data  segment  (Sanchez-Pescador  and  Urdea, 
1984).  The  oligodeoxynucleotides  were  prepared  by 
the  phosphoramidite  method  (Applied  Biosystems, 
Foster  City,  CA).  The  products  of  the  sequencing 
reactions  were  separated  in  7%  denaturing  poly¬ 
acrylamide  gels,  and  data  read  from  the  autoradio¬ 
grams  were  compiled  and  melded  by  using  the  GEL 
program  in  the  IntelliGenetics  Molecular  Biology 
software  package  (IntelliGenetics,  Inc.,  Mountain 
View,  CA). 


(e)  Enzymes  and  reagents 

Restriction  endonucleases  were  purchased  from 
International  Biotechnologies,  Inc.  (New  Haven, 
CT)  and  Bethesda  Research  Laboratories  (Gaithers¬ 
burg,  MD)  and  were  used  as  recommended  by  the 
suppliers.  T4  DNA  ligase  and  deoxynucleoside  and 
dideoxynucleoside  triphosphates  were  from  Phar¬ 
macia  P-L.  Pollk  was  purchased  from  Boehringer 
Mannheim  Biochemicals  (Indianapolis,  IN),  and 
[a-^^P]deoxynucleoside  triphosphates  (300-800  Ci/ 
mmol,  1 1.1-29.6  TBq/mmol)  were  from  Amersham 
(Arlington  Heights,  IL). 

(f)  Computer  analysis  of  nucleotide  sequence  and 
protein  secondary  structure 

The  sequence  in  pPA26  of  B.  anthracis  DNA  was 
analyzed  by  several  computer  software  packages. 
The  MOLGENJR  programs  (Lowe,  1986)  were  run 
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bp 


Fig.  I.  Construction  of  plasmid  containing  the  PA  gene  and  sequencing  strategy.  (A)  Plasmid  pPA26  was  constructed  from  pSE36  by 
partial  HindWl  digestion.  The  4.2-kb  HinAXW-BamW  portion  of  the  plasmid  (open  box)  contains  the  PA  gene.  The  distances  (in  kb) 
between  the  BamHI  (B)  and  ffindlll  (H)  sites  are  indicated;  EcoRI  sites  (E)  are  included.  (B)  To  sequence  this  insert,  pPA26  was 
digested  with  BamHI  +  //mdlll.  The  2.2-kb  ffindlll  and  2.0-kb  ffindlll-BamHI  fragments  were  isolated  and  cloned  into  MI3mpl0 
and  MI3mpl  1.  (C)  The  arrows  indicate  the  direction  and  extent  of  sequencing  of  the  DNA  fragments,  totalling  4235  bp.  The  hatched 
bar  indicates  the  structural  gene  for  the  mature  PA  protein. 
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Bin&ai  .  .  .  50  .  .  .100 

AAGCTTCIGrcAITCGTAAATTTCAaAIAGAACGiaAAmAaCT(rrcAiamAAMIGaAAAArCTT^^ 

150  ...  .  200 

TTUlGGTGTTmT&GmGAmGACK6II(»T(mmCTC(miAMmX&GCIMCaffUUm 

250  ...  .  300 

TGAimGTGGAITCCGGMIAGAIJCTGGTGAErTAGCTCEUmmTAGTGAITTAACXAACAAITT^^ 

350  -35  .  .  .  -10  400 

GATTTTTCCTGAAGCATAGTAIAAAAGAGTCAAGGICTICTAGACTTGACTCTIGGAATCAmGGAA^^ 

- >  ORF  .  .450  .  .  .  .  500 

MATTAAAinmAMTGAAIAITTIAGaAGaGAKCmTCATIAIGAiamACGGiaaiAnGIAGGGGTTGA!!^ 

MetAsnlleLeuValXrgAspPioTyrHlsTyiAspAsnAsnGlyAsnlleValGlyValAspAspSerTyiLeuLysAsnAlal'yr 

550  ...  .  600 

ATAAGCAAATACTTAAITGGTCAAGCGATGGAGTTTCTnmTCIAGATGAilGATGTAAMmGaCIATCTG 
LysGlnlleLeu&snTipSerSeiAspGlyValSeilieuAsnLeuAspGluAspVaUsnGlnAlaLeuSerGlyTyzMetLeuGlnlleLyslysPro 

650  ...  .  700 

TTCmCCACCIAACJUUlCAGCCCAGTTimmCAmGCAGGCAAGGACAGTGGTGITGGAGMTTGTATAGAGTAmTCAGATGGA^ 
SerAsnGisLeuThiAsnSerFioValThrlleThiLeuAlaGlylysAspSeiGlyValGlyGluLeuTyrArgValLeuSerAspGlyAlaGlyPhe 

750  ...  .  800 

CTGGAnTCAATAAGmGAIGmATTGGCGATttmGTAGATCercGTmaTGTraTGTGTATGCTGTTACXAAAGAAGAI^ 
IeuAspPheAsnI>ys?heAspGluAsnTipAigSerLeuVaUspFioGlyAspAspValTyrValTyEAlaValThrI>ysGluAspPheAsnXlaValThi 

850  .  < . 

CTCGAGATGAMATGGTiUlIAIAGCGAAimTIAAAAAACACCmGTTTniCGGCiaUUlAIAAA^^ 

AigXspGliiAsnGlyAsnIleXlaAsnLysLeuLysAsnThrLeuValLeuSeiGlylysIleI.ysGluIleA3nIleLysThi7hrAsnIleAsnIle 

- >  .  .  950  .  .  1000 

AITTCTAGTTTTTAIGTTTATTAIAIACCTCCTXTTTimmriAGTMCACAGTTTTTGCAAATaaCTAATim 

PheValValPheMetPhellelleTyrLeuLeuPheTyrllelleSeiSeiThrValPheAlaAsnEisVallleValTyrLeuSei^t 

1050  ....  1100 

aiaACTTATGAAIAGTGTATmATTGMCGTTGGTTAGCTrGGACAGTTGTMGGATATGCAIXCmAIAACGTMAAAAT^^ 

1150  ....  1200 

AAACTAATTmCAAAAACAAAAAaCACCTAAGATCATTCAGTTOTTTAAIAAGGAGCTGCCaCOUlGCTAAACCT^ 

1250  .  < - 

AGGTmTTTCTAAATAIACAGTGTAAGTTATOTGAAITTAACCAGTAIAjaTIAAAAATGimAreTIAACAAAnm^ 

- >1350  ....  1400 

GCAIAGTTAAGAGGGGIAGGimAAATnTTTGTTGAAATIAGAAAAAAmiAAAAAAACAAACCTATTITCTTT^^ 

1450  ....  1500 

(aAAAAGAAAACATGITTCAAGGTACAAIAATIATGGTTCTTTAGCmCTGTAAAACAGCCTIAATAGTTGGAimT^ 

1550  ....  1600 

GCAIAaCAATCTAITGAAGGmTTTAIAATGCAAITCCCIAAAAATAGTm(^AIAACCA6TTaTTr^ 

1650  ....  1700 

AITTTTAATGTATCTTCAAAAACAGCTTCTGTGTCCTITTCTATIAAACAECDUIAIT^^ 

.  <- . 35 - 1750  — . 10 - >.  .  rbs  1800 

(aAAAAIAAAIAAJTATCICTTTTIATTIAIAITAIAITGAAACIAAAGTTIAIIAAITTaAIAIAAIAIAAAITT^^ 

- >  SIG  .  .  1850  .  .  . - >  MAI 

IAIAIGAAAAAACGAAAAGTGTIAAZACCAnAAIGGCAIIGTCIACGAXAIIA6mCAAGCACAGGTAAimGA6GTGAII(»^^ 

MetLysLysArgLysValLeuIleProLcuMetAlaLeuSerlhrlleLeuValSerSerThrGlyAsnlieuGluVallleGlnAlaGluYalLysGln 

1950  ....  2000 

AGGAGAACCGGmilAAAIGAAICAGAAICAAGTICCCAGGGGIIACIAGGAIACTAimAGIGAITTGAAITTTCAAGCACCCATGGTGGmCCTC 
GluAsnArgLeuLeuAsnGluSeiGluSerSeiSerGlnGlyUuLeuGlylyilyiPheSeiAspLeuAsnPheGlnAlaPioMetValValThrSer 

2050  ....  2100 

TTCTACTACAGGGGATIIAICTAIICCnaUIITCTGACmGAAAAIAITCCAICGGAAAAaiAAIAtlTrcAAICI^ 
SezThrlhrGlyAspLeuSerlleProSeiSeiGluLeuGluAsnlleProSeiGluAsnGlnlyiPheGlnSerAlallelipSeiGlyPhelleLys 

2150  .  2200 

GTTAAGAAGAGTGATGAAIAIAaiTTGCTACTTCCG(miAAI(»IGIAACAAI(irGGGTAGAIGACCAAGAAGTGAITA^ 
ValLysLysSerAspGluIyrThrPheAlaThrSeiAlaAspAsnBisValThiMetIipValAspAspGlnGluValIleAsnI.y3AlaSe£AsnSerAsn 

2250  ....  2300 

ACAAAAICAGAIIAGAAAAAGGAAGAIIAIAICAAAIAAAAAIICAAXATCAACGAGAAAAICCtACmiUUUlGGAITGGAIITCAAGm 
LysIleAigLeuGluLysGlyArgLeuTyrGlnlleLysIleGlnTyzGlnArgGluAsnProIhiGluIiysGlyLeuAspPheLysLeuIyrTipIhr 

2350  ....  2400 

CGAITCTCAAAAIAAAAAAGAAGTGAIITCTAGTGAZAACTIACAAIIGCCAGAAIXAAAACAAAAAICTTCGAACTCAAGAAAAAA^ 
AspSeiGlnAsnlysLysGluVallleSerSetAspAsnLeuGlnLeuProGluIeuLysGlnLysSerSeiAsnSeEArgLysIiysArgSerThiSer 


Fig.  2.  Nucleotide  and  deduced  amino  acid  sequence  of  the  PA  gene  and  5 '  and  3 '  flanking  sequences.  The  sequence  shown  corresponds 
to  nt  1-4235  on  the  map  in  Fig.  1.  Restriction  endonuclease  sites  described  in  Fig.  1  and  in  RESULTS,  section  b,  are  indicated.  The 
presumptive  -35  and  -10  sequences,  and  Shinc-Dalgamo  ribosome-binding  site  (rbs)  of  the  PA  gene  and  of  the  potential  192-nt  ORF 
are  underlined,  as  are  the  start  (ATG)  and  stop  (TAG,  TAA)  codons.  Arrows  ab  tve  the  nucleotide  sequence  indicate  initiation  of 


291 


2450  ....  2500 

GCTl^CIACGGTTCCAGACCtniGACAATG&IGiaAICCCTGATTCAITMSAGGT^^ 

AlaGlyProIhrValFroAspArgAspAsnAspGlyIleProAspSerI.euGluValGluGlyTyiThrVaUspValLysA5nIysArg!rhiPheLeuSer 

2550  ....  2600 

CACCATGGimrinTUUmCAlimAAGmGGATTAACCAAAIAIftAATaTCTCCIGAAAAATGGA^ 
ProIipIleSeiAsnlleHisGluLysLysGlyLeuThrLysTyrLysSeiSerProGluLysTipSeiThrAlaSezAspProTyrSerAspPheGlu 

2650  ....  2700 

mGGTTACA6GM;GGAnGAraAGAATGTAICACCAGAGGC&AGU3U;CCCCTTGTGGCA6CmiCCGAITGTJ^^ 
LysValirhzGlyArglleAspLysAsnValSerPioGlu&laAigBisPioLeuValMaAlaTyiPioIleValHisValAsiMetGluAsnllelle 

2750  ....  2800 

CICTCAAAUUm»GGATCAAICCACAami%CT6AIftGTGAAACGAGAACAAIAAGIAAAAAIACT^ 

LeuSerLysilsnGluAspGlnSerThrGlnAsnThrAspSeiGluThrArgThrlleSeiLysAsnThiSerThiSeiArgThiHlsIhiSerGluValHis 

2850  .  .  ScdRl  .  .  2900 

ATGGmiGC£G&AGTGCATGCGTCGTTCmGmiTG6IGGGAGIGTA!im:AGGATTIAGT]Um£GA^ 
GlyAsnAlaGluValBlsAlaSerPhePheA^IleGlyGlySerValSetAlaGlyPheSeiAanSei&snSerSerThEValAlalleAspHisSer 

2950  ....  3000 

ACTXTCTCTAGCKGimAAGMCTTGGGCTGMlOimKGTTTmSKCCGCTGiaM^imAGI^^ 

LeuSerLeuAlaGlyGluArgThrTipAlaGluThiMetGlyLeuAsnThiftlaAspThiAlaAigLeuAsnAlaAsnlleArgTyEValAsnThEGly 

3050  ....  3100 

ACGGCTCCJUiICIACAMGIGmCCAACGACTTCGmGTGTIAGGAAAAAAT(m(aCTCGCGACAJmmGCTftAGGmaCCAOT 
ThrAlaPioIleTyiAsnValLeuProThiThESerLeuValLeuGlyLysAsnGlnThrLeuAlalhrlleLysAlaLysGluAsnGlnLeuSeiGlnlle 

3150  ....  3200 

TACTTGaCCm33Um7maiCCTTCmAAACTTGGCGCCAAICG<MTAAAIGaCAAGACGATnCMnCIA^ 
LeuAlaProAsnAsnTyETyEPEoSerLysAsnLeuAlaPioIleAlaLeuAsnAlaGlnAspAspPheSeiSerThrProIleThdfetAsnTytAsn 

3250  ....  3300 

TCMTTTCTTG&GnAGAAAmCGmCAATTAAGAmGAIACGG&TCAAGTAmGGGAATATAGOUlCAIACAATTm 
GlnPheLeuGluLeuGluIysThELysGlnLeuAigLeuAspThEAspGlnValTyiGlyAsnlleAlaThiTytAsnPheGluAsnGlyArgValArg 

3350  ....  3400 

GTGGATACAGGCICGAACTGGAGTGAAGTGTIACCGCAAATTCAAGAAACAACTGCACGTATCATITmAIGGAAAAGAmAAATCIGG^ 
ValAspThiGlySeiAsnTipSerGluValLeuPioGlnlleGlnGluIhifhiAlaArgllellePheAsnGlyLysAspLeuAsnXeuValGluAigArg 

3450  ....  3500 

GGAIAGCGGCGGTIT^CTAGTGATCCAmGAAACGACTAAACCGGAIATGACATTAAAAGAAGCCCmAAATAGCATnGGATTTAACGAK 
IleAlaAlaValAsnPioSeiAspPioLeuGluIhilTiELysPEoAsi^tThrLeuLysGluAlalieuIiysIleAlaPheGlyPbeAsnGluProAan 

3550  ....  3600 

1X1GAAACTIACAATATCAAGGGAAAGACAIAACCGAAIII(»TTITAATTICGATCAACAAACATCT(»AAA^ 
GlyAsnLeuGlnTyiGlnGlyLysAspIleThrGluPheAspPheAsnPheAspGlnGlnThESeiGlnAsnlleLysAsnGlnLeuAlaGluLeuAsn 

3650  ....  3700 

GCAACTAACAIAIAIACTGTAITAGAIAAAAICAAAIIAAATGCAAAAATGAAIAimAAIAAGAGAIAAA^ 

AlathEAsnlleTyEThEValLeuAspLysIIeLysUuAsnAlaLysMetAsnlleLeuIleAigAspIysAigPbeHisIyiAspAigAsnAsnlleAla 

3750  ....  3800 

CAGTTGGGGCGGAlEGAGTCAGIAGmAGGAGGCICAIAGAGAAGTAAIiauaTCGimCAGAGGGARAITGm^ 
ValGlyAlaAspCluSeiValValLysGluAlaHlsArgGluVallleAsnSeiSeiTbiGluGlyLeuLeuLeuAsnlleAspLysAapIleArgLys 

3850  ....  3900 

AATAmTCAGGmTAITGTAGAAA!ITGAAGATACTGAAGGGCTTAAAGAAGTTAIAAA!rGACAGA!IAI^^ 
IleLeuSeiGlyTyElleValGluIleGluAspTbiGluGlyLeuLysGlttVallleAsnAspArgTytAsiMetLeuAanlleSerSeiLeuArgGln 

3950  ....  4000 

GATGGAAAAACAITTAIAGATTTTAAAAAAIAIAATGATAAATIACCGTIAIAIAmGIAATCCCAAIIAIAAGGTAAAT^ 
AspGlyLysTbrPbelleAspPbeLysLys^iAsnAspLysLauPEoLeuTyrlleSeiAsnProAsnTyrLysValAsnValTyrAlaValTbiLysGlu 

4050  ....  4100 

AAAACACTATIATIAATCCTAGTGAGAATGGGGAlACTAGTACCAACGGGAICAAGAAAATTTUATCrr^ 
AsnlbrllelleAsnProSeiGluAsnGlyAspThrSerTbEAsnGlylleLysLysIleLeuIlePbeSeiLysLysGlyTyiGluIleGly 

.  < - 4150 . >  .  4200 

lAAITCTAGGIGATimAAAmTCTAAAAAACAGTAAAAmAAAaiACTaTITrGTAAGAAAIACAAGGAGAGIATGTIT^ 

BaJiBl  .  4250  ....  4300 

ATCATCATAATCCriTraiGAITGTT^^ 


translation  of  the  potential  ORF  upstream  from  the  PA  gene  (ORFl)  and  of  the  signal  sequence  (SIG)  and  mature  protein  (MAT)  of 
the  PA  gene.  The  29-aa  signal  peptide  is  underlined.  The  translated  amino  acid  sequences  of  the  192-nt  ORF  and  the  PA  gene,  only, 
are  shown.  The  potential  stem-loop  termination  structure  flanking  the  3'  end  of  the  PA  gene,  and  the  three  palindromic  sequences  on 
the  5'  side  of  the  PA  gene  are  indicated  by  dashed  lines  between  outward  pointing  arrowheads  above  the  sequences. 
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on  an  IBM-PC  microcomputer  to  confirm  experi¬ 
mental  restriction  enzyme  cleavage  patterns,  deduce 
ORFs,  and  translate  the  primary  nucleotide  se¬ 
quence  into  amino  acid  sequences.  Other  programs 
from  the  MOLGENJR  package  were  used  to 
examine  the  translated  peptide  sequences,  calculate 
codon  usage  and  polypeptide  M^s,  and  plot  hydropa¬ 
thy  and  secondary  structure  histograms.  We  used  a 
VAX750  minicomputer,  executing  program  SEQ  in 
the  IntelliGenetics  Molecular  Biology  software 
package,  to  search  for  regions  of  dyad  symmetry  and 
to  calculate  free  energies  of  base  pairing  in  potential 
DNA  hairpin  secondary  structure.  Other  unpub¬ 
lished  programs  and  algorithms  were  used  to  search 
for  potential  activation  sequences  (ENHAN- 
CE2.MSB)  and  significant  open  reading  frames 
(ORFREAL.MSB)  and  to  create  condensed,  dot¬ 
matrix  hydropathy  and  secondary  structure  histo¬ 
grams  (AGNAKDCF.MSB).  These  are  available 
from  J.R.  Lowe. 


RESULTS  AND  DISCUSSION 

(a)  Subcloning  of  the  PA  gene  and  sequencing 
strategy 

The  PA  gene  of  B.  anthracis  was  originally  cloned 
into  pBR322  as  a  6-kb  insert  (Vodkin  and  Leppla, 
1983).  Digestion  and  religation  of  the  recombinant 
plasmid  pSE36  yielded  a  smaller  plasmid  with  a 
4.2-kb  insert  (pPA26),  which  retained  the  PA  gene 
(Fig.  lA).  E.  coli  transformants  of  pSE36  and 
pPA26  both  produced  proteins  of  about  83  kDa  on 
SDS-PAGE  which  reacted  specifically  with  anti-PA 
antibody  on  Western-blot  analysis  and  were  biologi¬ 
cally  active  in  the  CHO  cell-elongation  assay 
(Leppla,  1984;  data  not  shown).  To  determine  the 
location  and  direction  of  transcription  of  the  PA 
gene,  the  4.2-kb  insert  was  excised,  digested  with 
Hindlll  +  BamHl  into  two  fragments  of  2.0  kb  and 
2.2  kb,  and  sequenced  as  indicated  in  Fig.  1,B  and 
C. 

(b)  Analysis  of  the  coding  and  regulatory  regions  of 
the  PA  gene 

The  nucleotide  sequence  of  the  PA  gene  is  shown 
in  Fig.  2.  Analysis  of  the  sequence  revealed  an  ORF 


2319  bp  long.  The  structural  gene  for  the  mature 
protein  began  at  nt  1891,  coding  for  a  glutamic  acid 
residue,  and  the  translated  sequence  was  in  agree¬ 
ment  with  both  the  N-terminal  amino  acid  sequence 
and  the  amino  acid  composition  determined  previ¬ 
ously.  The  coding  region  for  this  portion  of  the  PA 
gene  was  2205  bp  long,  encoding  a  735-aa  protein 
with  a  calculated  of  82  684.  The  size  of  the  mature 
PA  protein  as  determined  by  sequence  analysis  was 
similar  to  that  estimated  by  SDS-PAGE  analysis  of 
PA  from  Bacillus  culture  supernatants,  83-85  kDa 
(Ivins  and  Welkos,  1986;  Leppla  et  al.,  1985;  Vodkin 
and  Leppla,  1983).  The  final  residue  of  the  coding 
region  (glycine)  was  followed  by  a  TAA  stop  codon 
(nt4096).  Thus,  as  indicated  in  Fig.  1C,  all  except  the 
N-terminal  53  aa  were  encoded  within  the  2.0-kb 
HirtAWl-BamUl  fragment  at  the  3'  end  of  the  4.2-kb 
B.  anthracis  insert.  This  location  of  the  gene  at  the 
end  of  the  insert  confirms  the  position  of  the  PA  gene 
mapped  in  recently  isolated  B.  subtilis  recombinants 
(Ivins  and  Welkos,  1986).  In  that  study,  cloning  of 
the  B.  anthracis  insert  into  B.  subtilis  (pUBllO) 
yielded  two  plasmid  recombinants  with  deletions  at 
the  5'  end  of  the  insert.  The  smaller  recombinant 
plasmid  retained  just  2.6-kb  of  DNA  at  the  3'  end 
of  the  PA  insert  but  produced  full-sized,  functional 
PA  protein  (Ivins  and  Welkos,  1986). 

Preceding  the  sequence  encoding  the  83-kDa  PA 
protein  (starting  at  nt  1891)  were  two  ATG  codons 
in  phase  with  the  ORF,  at  nt  1834  and  1804.  Similar 
to  other  Bacillus  proteins,  PA  is  a  secreted  protein 
and  is  probably  synthesized  as  a  precursor  having  a 
signal  peptide.  The  methionine  codon  at  nt  1804 
appears  to  be  the  likely  starting  point  for  translation. 
It  would  initiate  a  sequence  having  several  charac¬ 
teristics  in  common  with  other  Bacillus  signal  se¬ 
quences  that  have  been  identified.  The  29-aa  peptide 
that  would  be  encoded  is  typical  of  the  size  of  other 
Bacillus  signal  sequences  (Blobel  et  al.,  1979;  Kreil, 
1981;  Lampen  and  Nielsen,  1982;  Mikesell  et  al., 
1983;  Wells  et  al.,  1983;  Yang  et  al.,  1983).  Also,  the 
positively  charged,  N-terminal  5  aa  (Met-Lys-Lys- 
Arg-Lys),  the  hydrophobic  central  region  (aa  6-21), 
and  the  terminal  alanine  residue  are  characteristic  of 
bacterial  signal  peptides  (Blobel  et  al.,  1979;  Kreil, 
1981;  Lampen  and  Nielsen,  1982;  Mikesell  et  al., 
1983;  Wells  et  al.,  1983;  Yang  et  al.,  1983). 

A  putative  Shine-Dalgamo  ribosome-binding 
site,  indicated  in  Fig.  2,  is  located  7  bp  upstream 
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(A) 


A 

G  A 
A  A 
A=T 
T=A 

4160  •  G“C  •  4170 
T=A 
T=A 
T-G 
T-G 
T’A 
C=G 
T=A 
C=6 
A=T 

4150  • T=A  •  4100 
A=T 
C=G 
A»T 
A=T 
A=T 
A»T 

4130  4140  4190  4200 

AAAACAGTAAAAT=AACA6TAATCTAA 


(B) 


A  A 

T  T 
T=A 

A=T  •  900 
T=A 
A=T 
890  ■  A=T 
A=T 
C=G 
A=T 
T=A 
C=G 
A=T 

A=T  ■  910 
A=T 
A=T 
880  ■A=T 
T=A 
A=T 
C=G 
A=T 
A=T 
A=T 

T=A  ■  920 

A=T 

A=T 

A 

850  860 

TTTATC6GGTAAAATAAAA=TACCTCCTATTTTAT 


(C)  A  T 

C  A 

G  G 
1300  •  A^J 
A«T 
T»A 

T-A  •  1310 

C*G 

T=A 

C-G 

C-G 

C*G 

C=G 

1290  •  A=r^ 

A 

A  G 

G-  1320 
A-T 
A»T 
T  T 
G-T 
T-A 
T-A 
A  A 
A»T 
1280  •  A-T 
T 

T  T  •  1330 
A-T 
A-T 
A-T 
C-G 
A-T 
A-T 
T-G 

1270  •  T-A 
6  A 
T-A  •  1340 

A-T 

T 

T-A 

T-6 

1250  1260  x.A  1350  1360 

TATATTAAAAATGT-AAAAATAATAAAAAA 


(D) 


T 

1750-  T  T 
G  A 
A=T 
A=T 
A 

T=A 
C  A 
A=T 
A=T 

A=T  ■  1760 
1740  -G-C 
T=A 
T=A 
A=T 
T=A 
A=T 
T=A 
T=A 
A=T 

T=A  ■  1770 
1730  •A=T 
T=A 
T=A 
T=A 
A=T 
T  T 
T  T 

1710  1720  x-A  178°  1790 

ATAATTATCTCTT^ATTTTATACAAA 


Fig.  3.  Possible  stem  and  loop  structures  found  in  the  upstream  and  downstream  sequences  from  the  P A  gene  and  in  the  putative  peptide 
coding  region.  Numbering  corresponds  with  that  of  Fig.  2.  The  calculated  free  energies  of  these  conformations  were  (A)  dGr  =  -22.2 
kcal/mol,  (B)  dGf  =  -25.4  kcal/mol,  (C)  AGf  =  -19.6  kcal/mol,  and  (D)  dOf  =  -15.8  kcal/mol. 
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from  the  ATG  codon  at  nt  1 804.  The  sequence  of  this 
site,  AAAGGAG,  and  the  distance  separating  it 
from  the  sxart  codon,  closely  resemble  the  charac¬ 
teristics  of  the  Shine-Dalgamo  sites  reported  for 
several  other  Bacillus  sp.  genes  (Duvall  et  al.,  1984; 
McLaughlin  et  al.,  1982;  Ohmura  et  al.,  1983; 
Waalwijk  etal.,  1985;  Yang  et  al.,  1983).  The 
Shine-Dalgamo  sequence  has  a  calculated  binding 
energy  with  B.  subtilis  16S  rRNA  of  -14.0  kcal/mol 
(Band  and  Henner,  1984;  McLaughlin  et  al.,  1981; 
Tinoco  et  al.,  1973).  Possible  promoter  sequences 
are  underlined  in  Fig.  2.  The  putative  RNA  poly¬ 
merase  recognition  site  (TAT  A  AT)  at  nt  1764  is 
identical  to  the  E.  coli  and  B.  subtilis  -10  con¬ 
sensus  sequence.  The  6-bp  sequence  starting  at  nt 
1738,  and  separated  by  20  bp  from  the  -10  site, 
resembles  the  conserved  -35  site  of  E.  coli  and  the 
-35  site  reported  for  genes  of  Gram-positive 
organisms  (Rosenberg  and  Court,  1979).  The  opti¬ 
mal  distance  between  the  -10  and  -35  RNA  poly¬ 
merase  recognition  regions  in  B.  anthracis  genes  is 
unknown.  In  E.  coli,  these  sequences  are  separated 
by  16  to  19  bp,  with  17  bp  being  the  most  frequent 
and  resulting  in  maximum  promoter  strength 
(Rosenberg  and  Court,  1979).  Bacillus  promoters, 
especially  those  recognized  by  o^^-containing  RNA 
polymerase,  are  often  similar  in  their  sequence  and 
spacing  to  E.  coli  promoters;  however,  several  dif¬ 
ferent  promoter  sequences  have  been  identified 
(Fliss  and  Setlow,  1984;  Johnson  etal.,  1983; 
Waalwijk  et  al.,  1985;  Wells  et  al.,  1983).  Also,  dis¬ 
tances  between  the  two  promoter  regions  as  long  as 
21  bp  have  been  reported  for  other  sequences,  e.g. 
the  pertussis  toxin  gene  (Locht  and  Keith,  1986).  In 
vitro  and  in  vivo  transcription  analyses  will  be  neces¬ 
sary  to  locate  the  precise  promoter  region  for  the  PA 
gene. 

An  inverted  repeat  forming  a  potential  termination 
stmcture  was  located  3 '  of  the  translation  stop  codon 
as  shown  in  Fig.  3A.  The  putative  hairpin  structure 
contained  19  complementary  bp  and  two  T-G  mis¬ 
matches  between  nt  4142  and  4188.  The  structure 
had  a  strong  predicted  free  energy  of  bp  formation 
(AGf  =  -22.2  kcal/mol). 

We  observed  three  additional  regions  forming 
potential  stem-and-loop  structures  having  large 
negative  free  energies  of  formation.  The  sequence 
from  nt  868-926  (Fig.  3B),  was  inside  the  192-codon 
ORF  and  had  a  strong  calculated  AGf  of  -25.4 


kcal/mol.  The  second  region  of  dyad  symmetry 
(Fig.  3C),  from  nt  1263-1346,  had  a  predicted 
dGf  =  -19.6  kcal/mol.  The  third  region  (Fig.  3D) 
spanned  the  PA  promoter  from  nt  1722-1779  and 
had  a  predicted  AGf  =  -15.8  kcal/mol.  If  any  or  all 
of  these  regions  is  recognized  as  a  transcriptional 
terminator  in  E.  coli,  their  presence  could  possibly 
explain  the  low  PA  synthesis  from  the  original  clones 
(5-10  ng  PA/ml)  (Vodkin  and  Leppla,  1983). 

(c)  Base  composition  and  codon  usage  of  the  PA  gene 

The  base  composition  of  the  coding  strand  of  the 
PA  gene  was:  A  =  39%,  T  =  30%  (A  +  T  =  69%  of 
total),  G  =  17%,  C  =  14%  (G  +  T  =  31%).  The 
codon  usage  is  shown  in  Table  1.  There  was  a 
preference  for  A  and  T  at  the  third  position  in  the 
codons,  which  might  reflect  the  high  A  +  T  content. 
The  codon  usage  was  similar  to  that  of  another 
Bacillus  gene  of  plasmid  origin,  the  crystal  protein 
toxin  gene  of  the  related  species  B.  thuringiensis 
(Schnepf  etal.,  1985).  However,  in  contrast  to  the 
crystal  protein  gene,  the  PA  gene  has  no  codons  for 
cysteine.  The  codon  usage  in  the  PA  gene  differed 
from  that  in  genes  for  toxins  and  other  proteins 
produced  by  other  Gram-positive  and  Gram-nega¬ 
tive  bacteria  (Table  1  and  data  not  shown). 

(d)  Analysis  of  protein  structure  from  the  nucleotide 
sequence 

The  prediction  of  the  amino  acid  sequence  of  PA 
and  the  deduction  of  protein  structural  information 
were  performed  by  algorithms  of  the  computer  pro¬ 
grams  described  above  (Lowe,  1986,  and  other  un¬ 
published  programs).  The  algorithms  used  to  predict 
the  hydropathic  profile  and  the  protein  secondary 
structure  are  based  on  the  methods  of  Kyte  and 
Doolittle  (1982)  and  Chou  and  Fasman  (1978), 
respectively.  These  predictions  for  the  mature  pro¬ 
tein  as  well  as  putative  signal  sequence  are  shown  in 
Fig.  4.  The  signal  sequence  (region  A  of  Fig.  4),  has 
a  hydrophilic  N-terminal  end  and  a  hydrophobic 
central  core,  as  expected  from  comparisons  with 
similar  analyses  of  other  proteins  with  confirmed 
signal  sequences  (not  shown). 


MYDROP'MOB  I  C:  MYDROF»MI  l_  I  C 

♦4  *3  *2  ♦!  0  -1  -2  -3  -4 

HSTR 


RniduH  Position 


Fig.  4.  Kyte- Doolittle  hydropathic  analysis  of  764  residue  polypeptide  from  PA.  Combination  hydropathy/secondary  structure  plot  of 
the  PA  protein.  The  left  margin  contains  the  amino  acid  sequence  of  PA.  Residue  numbers  are  scaled  on  the  right  ordinate.  The  abscissa 
units  are  hydropathy  values.  Dot  positions  on  the  left  portion  of  the  plot  indicate  the  most  probable  secondary  structure  feature  predicted. 
Headings  are:  H  for  helix,  S  for  ^-sheet,  T  for  ^-turn,  and  R  for  random  coil.  Region  A  is  the  hydrophobic  signn'  sequence  with  its  highly 
charged  N  terminus.  Regions  B-J  are  potential  antigenic  sites  in  the  sequence.  Dot-matrix  output  was  computer-generated  with  program 
AGNAKDCF.EXE.  Algorithms  developed  according  to  the  schemes  of  Kyte  and  Doolittle  (1982)  and  Chou  and  Fasman  (1978). 
Polypeptide  molecular  size  =  85  786  Da. 
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TABLE  I 

Amino  acid  composition  of  PA  and  codon  usage  comparisons  " 


Species  and  gene 


aa*-' 

Codon 

B.a. 

PA 

B.a. 

ORFl 

B.t. 

CryPro 

C.t. 

TetTox 

C.d. 

DTox 

S.a. 

EntB 

B.p. 

PToxS3 

E.c. 

ToxA 

V.c. 

CTxA 

Ala 

GCU 

31.7 

12.5 

31.3 

37.5 

42.2 

42.9 

4.0 

28.6 

23,1 

4r 

GCC 

4.9 

0.0 

12.5 

12.5 

111 

0.0 

56.0 

0.0 

7.7 

GCA 

46.3 

75.0 

32.8 

41.7 

20.0 

57.1 

8.0 

50.0 

53.8 

GCG 

17.1 

12.5 

23.4 

8.3 

26.7 

0.0 

32.0 

21.4 

15.4 

Arg 

CGU 

10.3 

0.0 

21.3 

5.3 

35.3 

16.7 

0.0 

18.2 

0.0 

29 

CGC 

0.0 

0.0 

5.3 

0.0 

5.9 

0.0 

64.3 

0.0 

0,0 

CGA 

10.3 

50.0 

13.3 

5.3 

11.8 

33.3 

0.0 

4.5 

0.0 

CGG 

13.8 

0.0 

2.7 

0.0 

5.9 

16.7 

7.1 

9.1 

0.0 

AGA 

55.2 

50.0 

44.0 

57.9 

17.6 

33.3 

7.1 

54.5 

33.3 

AGG 

10.3 

0.0 

13.3 

31.6 

23.5 

0.0 

21.4 

13.6 

66.7 

Asn 

AAU 

76.8 

71.4 

74.4 

81.7 

80.0 

71.4 

0.0 

87.5 

85.7 

69 

AAC 

23.2 

28.6 

25.6 

18.3 

20.0 

28.6 

100.0 

12.5 

14.3 

Asp 

GAU 

87.2 

93.8 

79.4 

92.3 

71.4 

70.8 

12.5 

62.5 

100.0 

47 

GAC 

12.8 

6.2 

20.6 

7.7 

28.6 

29.2 

87.5 

37,5 

0.0 

Cys 

UGU 

_ 

_ 

64.7 

75.0 

50.0 

100.0 

0.0 

100.0 

100.0 

0 

UGC 

— 

— 

35.3 

25.0 

50.0 

0.0 

100.0 

0.0 

0.0 

Gin 

CAA 

83.9 

100.0 

81.6 

84.6 

87.5 

100.0 

30.0 

45.5 

100.0 

31 

CAG 

16.1 

0.0 

18.4 

15.4 

12.5 

0.0 

70.0 

54.5 

0.0 

Glu 

GAA 

74.5 

100.0 

70.7 

82.1 

59.5 

66.7 

71.4 

53,8 

85.7 

51 

GAG 

25.5 

0.0 

29.3 

17.9 

40.5 

33.3 

28.6 

46.2 

14.3 

Gly 

GGU 

11.1 

41.7 

25.0 

35.7 

37.0 

44.4 

9.5 

35.0 

50.0 

36 

GGC 

5.6 

8.3 

12.5 

10.7 

13.0 

0.0 

61,9 

15.0 

0,0 

GGA 

52.8 

41.6 

45.0 

50.0 

21.7 

44,4 

14.3 

45.0 

50.0 

GGG 

30.6 

8.3 

17.5 

3.6 

28.3 

11.1 

14.3 

5.0 

0,0 

His 

CAU 

90.0 

66.7 

90.9 

85.7 

58.8 

66.7 

25.0 

50.0 

75.0 

10 

CAC 

10.0 

33.3 

9.1 

14.3 

41.2 

33.3 

75.0 

50.0 

25.0 

He 

AUU 

50.9 

50.0 

56.3 

36.8 

36.1 

57.1 

18.8 

38.9 

66.7 

57 

AUC 

17.5 

0.0 

23.9 

1.8 

25.0 

0.0 

56.3 

0.0 

8.3 

AUA 

31.6 

50.0 

19.7 

61,4 

38.9 

42.9 

25,0 

61.1 

25.0 

Leu 

UUA 

67.7 

52.6 

45.0 

60,4 

20.0 

52.4 

0.0 

61.1 

33.3 

62 

UUG 

12.9 

5.3 

11.0 

13.2 

15.0 

23,8 

12.5 

5.6 

11. 1 

CUU 

9.7 

10.5 

22,0 

11.3 

25.0 

4.8 

8.3 

16.7 

11.1 

cue 

3.2 

5.3 

2.0 

0.0 

10.0 

4.8 

20.8 

5.6 

0.0 

CUA 

4.8 

21.0 

15.0 

11.3 

20.0 

9,5 

0.0 

0.0 

33,3 

CUG 

1.6 

5.3 

7.0 

3.8 

10.0 

4.8 

58.3 

11. 1 

11.1 

Lys 

AAA 

78.3 

75.0 

72.7 

92.0 

72.5 

76.5 

0.0 

83.3 

63.6 

60 

AAG 

21.7 

25.0 

27.3 

8.0 

27.5 

23.5 

100.0 

16.7 

36.4 

Met 

AUG 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

10 

Phe 

UUU 

79.2 

77.8 

75.9 

95.7 

78,9 

85.7 

0.0 

66.7 

100.0 

24 

UUC 

20.8 

22.2 

24.1 

4.3 

21.1 

14.3 

100.0 

33.3 

0.0 

297 


TABLE  I  (continued) 


Species  and  gene** 


aa‘' 

Codon 

B.a. 

PA 

B.a. 

ORFl 

B.t. 

CryPro 

C.t. 

TetTox 

C.d. 

DTox 

S.a. 

EntB 

B.p. 

PToxS3 

E.C. 

ToxA 

V.c. 

CTxA 

Pro 

ecu 

31.0 

50.0 

33.9 

41.2 

39.1 

42.9 

0.0 

7,7 

66.7 

29 

CCC 

10.3 

0.0 

5.4 

5.9 

13.0 

14.3 

18,2 

23.1 

0.0 

CCA 

37.9 

50.0 

42.9 

47.1 

26.1 

42.9 

18.2 

53,8 

33.3 

CCG 

20.7 

0.0 

17.9 

5.9 

21.7 

0.0 

63.6 

15.4 

0.0 

Ser 

ucu 

30.6 

21.4 

19.8 

45.7 

27,8 

43.8 

0.0 

26.3 

22.2 

72 

ucc 

4.2 

0,0 

15.1 

2.2 

7.4 

0.0 

30.0 

15.8 

0.0 

UCA 

20.8 

35.7 

27.9 

23.9 

13.0 

12.5 

0.0 

31.6 

33.3 

UCG 

9.7 

7,1 

7.0 

2.2 

18.5 

18.8 

20.0 

5.3 

11. 1 

AGU 

31.9 

14.3 

22.1 

17.4 

13.0 

18.8 

0.0 

10.5 

33.3 

AGC 

2.8 

21.4 

8.1 

8.7 

20.4 

6.3 

50.0 

10.5 

0.0 

Thr 

ACU 

32.8 

33,3 

28,4 

33.3 

40.0 

64.3 

5.9 

41.7 

30.0 

58 

ACC 

10.3 

II. 1 

16.2 

16.7 

20.0 

0.0 

47.1 

16.7 

10.0 

ACA 

37.9 

55,6 

31.1 

50.0 

23.3 

14.3 

5.9 

41.7 

40.0 

ACG 

19.0 

0,0 

24.3 

0.0 

16.7 

21.4 

41.2 

0.0 

20.0 

Trp 

7 

UGG 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

Tyr 

UAU 

78.6 

81.8 

76.9 

88.2 

72.2 

77.3 

42.1 

73.9 

100,0 

28 

UAC 

21.4 

18.2 

23.1 

11,8 

27.8 

22.7 

57.9 

26.1 

0.0 

Val 

GUU 

27.9 

52.6 

25.9 

42,9 

36.2 

42.1 

9.1 

63.6 

33.3 

43 

GUC 

4.7 

0.0 

14.8 

0,0 

12.8 

5.3 

54.5 

9.1 

16.7 

GUA 

39.5 

42.1 

40.7 

50,0 

31.9 

31.6 

18.2 

18.2 

50.0 

GUG 

27.9 

5.3 

18.5 

7.1 

19.1 

21,1 

18.2 

9.1 

0.0 

M, 

85787 

21610 

133047 

65900 

60753 

31433 

24989 

29862 

13909 

“  Within-group  percentage  codon  usage  calculated  with  MOLGENJR  software  package  (Lowe,  1986). 

The  following  genes  from  the  species  listed  were  examined. 

B.a.  PA,  Bacillus  amhracis  protective  antigen  gene  (PA ). 

B.a.  OREL  Bacillus  amhracis  hypothetical  protein  gene  1  on  pXOI  plasmid. 

B. t.  CryPro.  Bacillus  thuringiensis  crystal  protein  gene  (Sanger  and  Coulson,  1977) 

C. t.  TetTox,  Clostridium  letani  tetanus  toxin  gene  (Fairweather  et  al.,  1986). 

C.d.  DTox,  Corynebacterium  diphtheriae  diphtheria  toxin  gene  (Greenfield  et  al.,  1983). 

S.a.  EntB,  Staphylococcus  aureus  enterotoxin  B  gene  (Jones  and  Khan,  1986). 

B.p.  PToxS3,  Bordetella  pertussis  pertussis  toxin  S3  binding  subunit  gene  (Locht  and  Keith,  1986), 

E.c.  ToxA,  Escherichia  coli  heat-labile  enterotoxin  A  gene  (Yamamoto  et  al.,  1984). 

V.c.  CTxA,  Vibrio  cholerae  cholera  toxin  alpha  subunit  gene  (Mekalanos  et  al.,  1983). 

Total  number  of  specific  aa  residues  deduced  from  the  nucleotide  sequence  of  the  PA  gene  is  shown  below  each  aa. 


(e)  Regions  of  the  sequence  upstream  from  the  PA 
gene 

Other  ORFs,  in  addition  to  the  longest  one  of  23 1 9 
bp  encoding  PA,  were  found  in  the  4.2-kb  sequence. 
The  only  ORF  at  least  100  codons  long  was  a  576-nt 


sequence  (ORFl)  beginning  with  an  ATG  at  nt 
position  416  upstream  of  the  PA  gene.  The  192- 
codon  ORF  encodes  a  polypeptide  with  a  calculated 
A/^  of  21 610  Da.  The  codon  usage  of  the  translated 
region  is  similar  to  that  observed  for  the  PA  gene 
(Table  I).  A  computer  analysis  (ORFREAL.MSB) 
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Fig.  5.  Kyte- Doolittle  hydropathic  analysis  of  192  residue  polypeptide  from  PA260RFI.  Combination  hydropathy/secondary  structure 
plot  of  the  putative  peptide.  Plot  organization  and  generation  are  the  same  as  described  in  Fig.  4.  Region  A  is  a  long,  highly  hydrophobic 
28-aa  sequence  with  mostly  ^-sheet  structure  predicted.  Regions  B-D  are  potential  antigenic  sites  in  the  sequence.  Polypeptide  molecular 
size  =  21 609  Da. 


of  this  ORF  according  to  the  method  of  Fickett 
(1982)  calculated  a  92%  coding  probability.  A  simi¬ 
lar  analysis  of  the  PA-coding  region  also  gave  a  92% 
coding  probability.  Potential  -10  and  -35  RNA 
polymerase  recognition  sites,  but  no  consensus 
Shine-Dalgamo  site,  on  the  5'  side  of  the  cryptic 
ORF  were  identified.  The  ORF  terminated  with  a 
TAG  stop  codon.  Fig.  5  is  a  hydropathy  plot  and 
secondary  structure  analysis  of  the  putative  protein. 
The  sequence  does  not  appear  to  encode  a  signal 
peptide  but  does  have  an  interesting  C  terminus  rich 
in  hydrophobic  residues  embedded  in  a  region  with 
a  high  probability  for  ^-sheet  structure.  This  suggests 
that  the  protein  could  be  membrane-bound  at  its  C 
terminus.  The  significance  of  this  putative  gene  is 
unknown  and  awaits  analysis  of  expression  experi¬ 
ments  using  the  cloned  plasmid  DNA. 

(f)  Conclasions 

The  nucleotide  sequence  of  the  ORF  encoding 
PA,  one  of  the  three  protein  components  of  anthrax 


toxin,  was  determined.  A  region  encoding  a  putative 
29-bp  signal  sequence,  regulatory  sequences  up¬ 
stream  from  the  PA  gene,  and  a  newly  identified 
ORF  were  also  deduced.  The  codon  usage  in  the  PA 
gene  differed  from  that  in  genes  for  other  bacterial 
proteins  compared,  except  for  a  crystal  protein  toxin 
gene  in  the  related  species  B.  thuringiensis.  The 
availability  of  the  complete  nucleotide  sequence  of 
the  PA  gene  of  B.  anthracis  will  serve  several  useful 
purposes.  For  example,  the  PA  promoter  sequence  is 
being  probed  by  promoter-probing  vectors  and  the 
sequence  being  altered  by  site-specific  mutagenesis. 
Thus,  enhanced  production  of  cloned  PA  in  the 
B.  subtilis  or  E.  coli  hosts  will  become  feasible.  Also, 
specific  mutagenesis  of  the  PA-coding  region  will  be 
done  to ;  ( 1 )  produce  immunogenic,  biologically  inac¬ 
tive  cross-reactive  proteins  for  vaccine  studies 
(Hambleton  et  al.,  1984;  Little  and  Knudson,  1986; 
Turnbull  et  al.,  1986);  and  (2)  examine  the  role  of 
different  domains  of  PA  on  binding  to  target  cell 
membranes  and  to  the  EF  and  LF  components  of 
anthrax  toxin  (Friedlander,  1986;  Leppla,  1984; 
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O’Brien  et  al.,  1985).  Finally,  segments  of  the  PA 
nucleotide  sequence  will  be  used  as  probes  to 
examine  the  genetic  organization  of  PA  in  variant 
strains  of  B.  anthracis  (Little  and  Knudson,  1986; 
Turnbull  et  al.,  1986). 
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